A Unified Statistical Model for Generalized Translation Memory System
نویسندگان
چکیده
We introduced, for Translation Memory System, a statistical framework, which unifies the different phases in a Translation Memory System by letting them constrain each other, and enables Translation Memory System a statistical qualification. Compared to traditional Translation Memory Systems, our model operates at a fine grained sub-sentential level such that it improves the translation coverage. Compared with other approaches that exploit sub-sentential benefits, it unifies the processes of source string segmentation, best example selection, and translation generation by making them constrain each other via the statistical confidence of each step. We realized this framework into a prototype system. Compared with an existing product Translation Memory System, our system exhibits obviously better performance in the "assistant quality metric" and gains improvements in the range of 26.3% to 55.1% in the "translation efficiency metric". 1 This work was finished while the authors were working at Microsoft Research Asia (MSRA).
منابع مشابه
Towards a Unified Approach to Memory- and Statistical-Based Machine Translation
We present a set of algorithms that enable us to translate natural language sentences by exploiting both a translation memory and a statistical-based translation model. Our results show that an automatically derived translation memory can be used within a statistical framework to often find translations of higher probability than those found using solely a statistical model. The translations pr...
متن کاملAn EM Algorithm for Estimating the Parameters of the Generalized Exponential Distribution under Unified Hybrid Censored Data
The unified hybrid censoring is a mixture of generalized Type-I and Type-II hybrid censoring schemes. This article presents the statistical inferences on Generalized Exponential Distribution parameters when the data are obtained from the unified hybrid censoring scheme. It is observed that the maximum likelihood estimators can not be derived in closed form. The EM algorithm for computing the ma...
متن کاملModel Selection Based on Tracking Interval Under Unified Hybrid Censored Samples
The aim of statistical modeling is to identify the model that most closely approximates the underlying process. Akaike information criterion (AIC) is commonly used for model selection but the precise value of AIC has no direct interpretation. In this paper we use a normalization of a difference of Akaike criteria in comparing between the two rival models under unified hybrid cens...
متن کاملPower Function Distribution Characterized by Dual Generalized Order Statistics
The dual generalized order statistics is a unified model which contains the well known decreasingly ordered random data such as (reversed ordered) order statistics and lower record values. In the present paper, some characterization results on the power function distribution based on the properties of dual generalized order statistics are provided. The results are proved without any rest...
متن کاملA Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation
This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: 1) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. We develop novel features based on b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003